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YEARLY  PROGRESS  REPORT 


The  DURIP  cluster  has  rapidly  become  a  mainstay  of  our  research  in  the  following  areas: 

Quicksilver.  This  effort  is  building  a  new  system  for  scalable  distributed  computing.  The  basic 
problem  is  common  in  GIG  and  NCES  systems,  where  an  acute  need  has  arisen  for  simple  tools  to 
assist  the  developer  of  a  distributed  service  that  will  be  shared  by  huge  numbers  of  client  systems  in  a 
networked  environment.  Headed  by  Professor  Ken  Birman,  the  project  is  exploring  a  novel  fusion 
of  classical  protocols  for  reliable  multicast  communication  with  a  new  style  of  peer-to-peer  protocol 
called  scalable  “gossip”.  The  basic  idea  is  to  implement  a  communication  platform  using  these  new 
protocols,  and  then  integrate  the  platform  with  standard  Web  Services  tools  and  technologies  to 
achieve  a  uniquely  easy  to  use,  scalable  and  robust  solution. 

Quicksilver  currendy  has  three  sub-efforts  that  rely  heavily  on  the  cluster.  The  first  project  focuses 
on  what  we  are  calling  a  “scalable  services  architecture.”  In  this  work  we  explore  a  novel  new 
approach  to  building  high  performance,  scalable,  self-managed  distributed  services  that  can  be  (more 
or  less)  dragged  and  dropped  onto  the  cluster.  The  developer  doesn’t  need  to  have  any  special 
distributed  computing  experience  or  skills:  our  software  takes  a  conventional  Web  Services 
application  (or  a  set  of  them)  as  input,  uses  the  WSDL  document  to  deduce  a  replication  strategy, 
and  then  automatically  handles  load  balancing,  restart  after  failure,  workload  partitioning  and  other 
management  and  control  tasks.  The  cluster  is  our  primary  development  target:  we  expect  to  reach  a 
point  at  which  relatively  unskilled  developers  are  able  to  put  new  mission-critical  applications  on  the 
cluster  at  the  click  of  a  button.  This  work  is  in  C  on  Linux. 

Key  technical  ideas  include  the  use  of  gossip  for  dynamic  membership  tracking,  inconsistency 
discovery  and  state  reconciliation,  partitioning,  chain  replication  and  fault-discovery.  The  cluster  is 
key  here:  our  work  seeks  to  build  the  runtime  environment,  demonstrate  the  methodology  and 
evaluate  it  experimentally  in  realistic  military  GIG  scenarios  under  stress  typical  of  real  deployments. 
We’ll  make  the  solutions  available  to  our  colleagues  at  AFRL  in  Rome  NY. 

A  second  project  adopts  a  similar  approach  but  with  a  focus  on  time-critical  services.  Using  a  new 
form  of  forward  error  correction,  this  activity  seems  to  support  a  new  kind  of  time-critical  or  real¬ 
time  replication  technology  that  includes  support  for  deadline-driven  communication,  periodic 
communication,  and  guaranteed  low-latency  responsiveness  even  in  the  face  of  load  bursts  or 
failures.  We  believe  we  can  slash  response  times  by  as  much  as  two  orders  of  magnitude.  Again,  the 
methodology  is  heavily  experimental.  This  work  is  in  Java  on  Linux,  and  again  uses  Web  Services 
paradigms  and  standards. 

A  third  project  focuses  on  scalable  reliable  event  delivery,  messaging  and  notification.  Working  with 
the  virtual  synchrony  model,  and  implementing  in  C#  on  Windows  .NET,  we  are  building  a 
tremendously  scalable  high  speed  communications  infrastructure  that  remains  stable  and  self- 
managed  even  under  intense  stress  and  even  with  vast  numbers  of  clients  (most  likely  running  on 
Windows  PCs  using  Web  Services  interfaces  and  standards). 

Scalable  event  filtering  and  data  mining.  This  work  is  just  getting  underway  (it  required  a  large 
storage  capability,  and  only  just  became  practical).  The  emphasis  is  on  experimentally  exploring  new 
technologies  for  building  high  speed,  scalable,  data  processing  systems  in  which  a  variety  of 
probabilistic  techniques  are  exploited  to  improve  scalability  and  performance.  We  expect  to 
transition  our  solutions  to  projects  like  the  AFRL  JBI,  which  centers  on  a  massive  information 
repository  and  requires  a  high  speed  content  filtering  technology  to  achieve  good  scalability  and 
performance. 


STATUS  OF  EFFORT 


The  hardware  has  arrived  in  its  entirety  and  has  been  installed. 

ACCOMPLISHMENTS 

The  final  stages  of  hardware  implementation  were  completed  within  the  last  6  months.  This  is 
including  the  software  design  with  a  focus  on  the  overall  control  environment  of  nodes  and  switches, 
including  network  and  node  configuration  according  to  the  specification  of  the  emulation. 
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No  personnel  are  supported  under  this  grant. 
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•  Prof.  Johannes  E.  Gehrke 

•  Prof.  Emin  Gun  Sirer 

•  Dr.  Robbert  van  Renesse 

•  Dr.  Werner  H.  Vogels 

PUBLICATIONS 

Slingshot:  Time-Critical  Multicast  for  Clustered  Applications.  Mahesh  Balakrishnan,  Stefan 
Pleisch,  Ken  Birman.  To  Appear  in  IEEE  Network  Computing  and  Applications  2005  (NCA  05). 
Boston,  MA. 

At  this  time  we  have  one  paper  that  has  utilized  the  cluster,  this  is  due  in  large  part  to  the  cluster’s 
relatively  new  existence  coupled  with  the  configuration  and  troubleshooting  period  during  the 
installation  phase.  However,  we  do  currently  have  multiple  users  planning  on  using  the 
technology  and  we  fully  expect  to  publish  numerous  papers  over  the  next  several  years. 

INTERACTION/TRANSITIONS 

During  this  period  we  continued  interaction  with  Dr.  Mark  Linderman  to  achieve  3  goals: 

•  Involvement  of  the  JBI  team  in  the  requirements  and  design  phase  of  the  system 

•  Investigation  of  experimentation  scenarios  executed  by  Cornell  researchers  that  are  relevant 
to  the  JBI  effort. 

•  Investigation  of  ways  that  the  JBI  team  can  access  the  Testbed  for  running  unclassified 
scalability  scenarios. 


